Adaptive Load Balancing in MapReduce using Flubber
نویسندگان
چکیده
MapReduce has emerged as a successful framework for addressing the heavy demand for large-scale analytical data processing, in this peta-byte age. However, while on one hand the sheer size of data makes problems more challenging, the flexibility offered by the MapReduce frameworks on the other hand, makes the learning curve far steeper than expected. The general idea behind a MapReduce framework is to split the task into two components – a Mapper and a Reducer. The mapper executes a user-defined computation on chunks of data and generates the results, while the reducer groups the results together based on a common attribute. Scalability, hence, appears as an inherent trait of the design. A critical parameter in this configuration is the number of reducers required for a given task, and frameworks like Hadoop expect the user to specify this parameter while submitting a job. In this report, we focus on Hadoop and argue that deciding the number of reducers is a non-trivial task, let alone deciding it prior to running the job. To address this issue, we present Flubber – a simple pre-job that can be sandwiched between the original job and Hadoop. With a couple of parameters from the user, it takes a stab at figuring out the ideal number of reducers for the given job.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملComparative Study Load Balance Algorithms for Map Reduce environment
MapReduce is a famous model for data-intensive parallel computing in shared-nothing clusters. One of the main issues in MapReduce is the fact of depending its performance mainly on data distribution. MapReduce contains simple load balance technique based on FIFO job scheduler that serves the jobs in their submission order but unfortunately it is insufficient in real world cases as it missed man...
متن کاملAn Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce
The mining of frequent itemsets is a basic and essential work in many data mining applications. Frequent itemsets extraction with frequent pattern and rules boosts the applications like Association rule mining, co-relations also in product sale and marketing. In extraction process of frequent itemsets there are number of algorithms used Like FP-growth,E-clat etc. But unfortunately these algorit...
متن کاملJumbo: Beyond MapReduce for Workload Balancing
Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model in particular has some inherent limitations when it comes to wo...
متن کاملROUTE: run-time robust reducer workload estimation for MapReduce
MapReduce has become a popular model for large-scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline-aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user-specified map function of a given job are unknown to the MapReduce framework before ...
متن کامل